A Corpus-Based Prosodic Modeling Method for Mandarin and Min-Nan Text-to-Speech Conversions

نویسنده

  • Sin-Horng Chen
چکیده

This talk gives an introduction to a recurrent neural network (RNN) based prosody synthesis method for both Mandarin and Min-Nan text-tospeech (TTS) conversions. The method uses a fourlayer RNN to model the dependency of output prosodic information and input linguistic information. Main advantages of the method are the capability of learning many human’s prosody pronunciation rules automatically and the relatively short time of system development. Two variations of the baseline RNN prosody synthesis method are also discussed. One uses an additional fuzzy-neural network to infer some fuzzy rules of affections from high-level linguistic features for assisting in the RNN prosody generation. The other uses additional statistical models of prosodic parameter to remove some affecting factors of linguistic features for reducing the load of the RNN.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosodic modeling in large vocabulary Mandarin speech recognition

The issue of incorporating prosodic information into speech recognition processes has emerged in recent years. In this work we present a complete framework for Mandarin speech recognition with prosodic modeling considering two-level hierarchical prosodic information for Mandarin Chinese. We developed a GMM-based, a decision-tree-based, and a hybrid approach. The best improvements in character r...

متن کامل

Unsupervised prosody labeling for constructing Mandarin TTS

This paper introduces an unsupervised prosody labeling method for preparing a large speech corpus used in developing a Mandarin Text-to-Speech system. Adopting a four-layer prosody hierarchy, the proposed method performs an unsupervised segmental clustering that iteratively segments spoken utterances into strings of prosodic constituents and models the patterns of the segmented prosodic constit...

متن کامل

Multilingual Speech Corpora for TTS System Development

In this paper, four speech corpora collected in the Speech Lab of NCTU in recent years are discussed. They include a Mandarin treebank speech corpus, a Min-Nan speech corpus, a Hakka speech corpus, and a Chinese-English mixed speech corpus. Currently, they are used separately to develop a corpus-based Mandarin TTS system, a Min-Nan TTS system, a Hakka TTS system, and a Chinese-English bilingual...

متن کامل

Toward Constructing A Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin Chinese

The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung University and sponsored by the National Science Council of Taiwan. It is expected that a multilingual speech corpus will be collected, covering the three most frequently used languages in Taiwan: Taiwanese (Min-nan), Hakka, and Mandarin. This 3-year project has the goal of collecting a phonetically ab...

متن کامل

A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese

This paper presents a set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. A large speech corpus produced by a single speaker is used, and the speech output is synthesized from waveform units of variable lengths, with desired linguistic properties, retrieved from this corpus. Detailed methodologies were developed for designing “phonetically rich” and “prosodically ric...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000